- Published on
Systems Thinking in Action - When and How to Use Root Cause Analysis Tools
- Authors
- Name

Systems Thinking in Action
If you are a Systems Engineer involved in definition, design, implementation, integration, or validation, you have most likely encountered the term "systems thinking" many times. But what does it truly mean? Over the past several decades, many authors and systems practitioners have defined and redefined systems thinking. However, one of the most fundamental definitions comes from Barry Richmond. He describes systems thinking as a continuum of activities ranging from conceptual to technical, spanning from a systems-level perspective to implementation-level thinking 1. Ross and Jon analyzed numerous definitions of systems thinking in their research 2. In essence, and without overcomplicating it, systems thinking considers how elements, their interrelationships, functions, and dynamic behavior form a whole. Each system serves a goal or objective through the interaction of its components. The more complex these interactions are, the more difficult it becomes to identify the root cause of a system issue.
The Engineer’s Toolkit: Making Sense of Root Cause Analysis Methods
To address the challenges that arise from system interactions, systems engineers use a variety of approaches to identify root causes. Root Cause Analysis (RCA) is a structured approach followed to identify the underlying causes of problems rather than merely addressing the symptoms. There are several RCA methods, each suited to different types of problems depending on factors like the complexity of the system functions, data availability, and system maturity. Here is a summary of commonly used RCA methods and guidance on when to use each. If you’d like to get to the summary and a flowchart, feel free to scroll to the end of the post.
5 Why
- Ask "Why?" repeatedly (typically five times) to drill down to the root cause of a problem. This method is best for simple to moderately complex problems where the cause-effect relationship is linear. For example, a test equipment stopped working. The method is used in manufacturing, operations, or service issues.
- While it is easy to use, one can oversimplify the approach. So it may not be ideal for complex or multi-factor problems.
Fishbone Diagram (Ishikawa or Cause-and-Effect Diagram)
- A visual tool that categorizes potential causes into groups such as Machines, People, Materials, Measurements, and Environment. It is best for brainstorming and categorizing potential causes of a problem. For example, why did a product fail in a production environment?
- It is one of the best tools for visualizing and organizing thoughts and helps in team collaboration. However, one may not be able to get to the root cause by Fishbone analysis alone – a deeper analysis may be needed to get to the root cause.
Fault Tree Analysis (FTA)
- A top-down, deductive failure analysis that uses logic to determine the cause of a potential failure. This is best for complex systems where failures can result from combinations of events. An application example is investigating a system-level failure in automotive electronics or aerospace.
- It is a very detailed and structured approach and highly effective for safety-critical systems. However, due to the detailed analysis required, it is not only time-consuming but also requires technical expertise and detailed system knowledge.
Failure Mode and Effects Analysis
- FMEA systematically evaluates possible failure modes within a system and their potential effects. It is best when applied during the product or process design and development stages. For example, assessing risks in a new inverter design before component and system validation.
- It helps prioritize issues by risk, enabling teams to develop mitigation methods and control mechanisms. However, there is very little value one can add by following this approach post-production.
Pareto Analysis
- It uses the 80/20 rule to identify the most common causes contributing to most problems. It is best when used to prioritize problems or causes (once identified using another RCA approach) based on the frequency of occurrence or impact. For example, reducing defect rates during production by focusing on the top contributors.
- It is data-driven and easy to implement, but doesn’t identify root cause directly.
Kepner-Tregoe Analysis
- A structured, logical decision-making method that includes problem analysis, decision analysis, and potential problem analysis. It is best for complex problems requiring systematic decision-making and evaluation of alternatives. An example use is evaluating why a multi-functional software system failed in a customer vehicle and determining corrective action.
- It is a very comprehensive and methodical approach, but can be time-consuming and resource-intensive.
8D (Eight Disciplines) Problem Solving and 3x5 Why Analysis
- A team-oriented, structured approach is typically used in manufacturing and quality to solve complex problems. It is best for chronic, recurring issues or customer complaints in regulated industries (automotive, aerospace). An application example is to identify the root cause for repeated failures during system validation.
- These two methods include root cause analysis, corrective action, and prevention. However, these approaches require cross-functional teams to participate during analysis.

Conclusion
Each RCA method has its strengths and limitations. The key to effective root cause analysis lies in selecting the right method based on the nature of the problem and the development phase of the system. Understanding these tools and applying them appropriately ensures more effective diagnosis, better system performance, and improved cross-functional collaboration.
Footnotes
Richmond, B. Systems Thinking: Four Key Questions. https://www.iseesystems.com/resources/articles/ ↩
Ross D. A, Jon P. W, (2015), A Definition of Systems Thinking: A Systems Approach, Procedia Computer Science, Volume 44, https://doi.org/10.1016/j.procs.2015.03.050. ↩